AirBnB - A Sentiment Analysis

Introduction

What is Airbnb and how does it work?

A community built on sharing

"Airbnb began in 2008 when two designers who had space to share hosted three travelers looking for a place to stay. Now, millions of hosts and travelers choose to create a free Airbnb account so they can list their space and book unique accommodations anywhere in the world. And Airbnb experience hosts share their passions and interests with both travelers and locals."

Trusted services

"Airbnb helps make sharing easy, enjoyable, and safe. We verify personal profiles and listings, maintain a smart messaging system so hosts and guests can communicate with certainty, and manage a trusted platform to collect and transfer payments."

Location

What Information Do We Have?

The source of the data is kaggle.com. The data used in this analysis is for two cities: Boston and Seattle

The data files are for calendar, reviews and listings. The listings datafile contains one observation per listings, with information related to: Basic information (location, space, host, images (of listing and host), availability), Reviews, and Price. The calendar and reviews datafiles contain multiple entries per listing relating to individual availability data and reviews.

Questions

For anyone new to AirBnB (like me), the most obvious questions relate to:

Location

We can plot the locations for AirBnB listings for both Boston and Seattle. Surprisingly (for me) there are listings dotted around each city.

Price

The breakdown of prices per zipcode for each city is:

Where the prices in red are for zipcodes where the price is at or below the 25th percentile.

Is there information in the calendar or reviews data files that would be useful?

Reviews

We have seen that the reviews are mostly positive. We are going analyze the reviews to see the sentiments expressed in the review. We are going use natural language process to scan each of the reviews for key "sentiments" that are expressed in the reviews.

We will do this in three ways.

Naive Examination

The results for the positive reviews are not too surprising, the results for the negative sentiments are shocking.

Let's scan through some of the reviews to see where the words die, rob, killer and war appear. It turns out, we only need to look at the one example of each to see why those apparently shocking terms are used.

Die

Killer

Rob

So some of the reviews are not in english, and some of the "sentiments" detected are due to colloquial usages. First, let's remove the non-english reviews.

English Only Reviews

Surprisingly, "die" remains, but "kller" (which is mainly english colloquial) doesn't. Unsurprisingly, "rob" still remains. Let's see why "die" is used.

Ok, more colloquial usage, what about "killer"?

Word Clouds Proportional To Sentiments Expressed

Time Taken

    Grouped by city, english language only: Time Taken: 303.38720989227295 seconds
                        loaded from pickle: Time Taken: 15.748566150665283 seconds

Do the dataframes include the same listings?

All listings, in the listings data, are in the calendar data file. But close to 19% are missing from the reviews data.

          city_merge
          Boston    127.976
          Seattle   173.926
          Name: price, dtype: float64

Let's Remove Non-English Reviews